'
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"height": 400,
"width": 500,
"title": "Mean Max Temperature Per Month",
"mark": "circle" ,
"encoding": {
"x": {"field": "month", "type": "temporal", "timeUnit": "month","title": "Month"},
"y": {"field": "temp_max", "type": "quantitative", "aggregate": "average", "title": "Temperature (F)"},
"color": {"field": "year", "title": "Year"},
"fillOpacity": {"value": 0.8},
"size": {"value": 80}
}
}' |> as_vegaspec()DATA 304 Homework 5
Exercise 1
Copy of Graphic:
Source: r/dataisbeautiful
What marks are being used? What variables are mapped to which properties?
The marks are lines. Specific “Worst Day” is mapped to color (Worst day -> color).
time after occurrence (years) -> x
percentage change -> y
Worst Day Occurrence -> color
What is the main story of this graphic?
The main story of the graphic says that from the three of the “worst days” for the stock market in history from 1987 to 2020, it took 2008’s Great Recession almost 4 years to recover from it’s state before the drop. 1987’s Black Monday took approximately 2 years and COVID-19’s crash recovered less than a year later.
What makes it a good graphic?
I really like that the x axis is not necessarily a datetime. It counts the number of years until recovery. I think that this makes the most sense to compare the recovery times because if we had a datetime on the x axis, it might be hard to compare since each worst day are multiple years apart. I also like that the colors are clear contrasts of each other.
What features do you think you would know how to implement in Vega-Lite?
I would create a line graph, but would have to layer each of the “worst days” and then define the cyan, pink, and yellow for each of the days. I would have to use “background”: “black” to change the background color. If the recovery times were in datetimes, I would have to calculate the difference between the first date and each specific row in the dataset so that on the x axis, I am able to layer each of the days on top of each other.
I would also have to add a layer of text for the lower right text that explains the graphs.
Are there any features of the graphic that you would not know how to do in Vega-Lite? If so, list them.
I am not sure how to add the “extra” lines on the x axis. I think that most of my graphics just use Vega-Lite’s default formatting. I also would have to learn how to cut off the extra lines so it does not exceed 0% on the y axis.
Exercise 2
Visualizing Weather 1
I am having issues with my graphs. I have my first graphic to be graded and then the rest I am attaching to try to debug or figure out.
Create a graphic that shows the mean temperature for each month. How many “months” should you be displaying? (There is more than one answer to this – perhaps try doing it more that one way.)
Way 1
In this graphic, I am showing the mean max temperature per month, color is used as a guide to show the year for a specific month.
Way 2
Using R to calculate the actual mean temperature for each day, and then aggregating by month
seattle_data <- read.csv("https://calvin-data304.netlify.app/data/weather-with-dates.csv")seattle_data$mean_daily_temp <- (seattle_data$temp_max + seattle_data$temp_min)/2seattle_weather_json <- toJSON(seattle_data, pretty = TRUE)seattle_mean_temp <- paste0(
'{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"values": ', seattle_weather_json, '},
"height": 400,
"width": 500,
"title": "Mean Temperature Per Month",
"mark": "circle",
"encoding": {
"x": {"field": "month", "type": "categorical"},
"y": {"field": "mean_daily_temp", "type": "quantitative", "aggregate": "average", "title": "Temperature (F)"},
"color": {"field": "year", "title": "Year"},
"fillOpacity": {"value": 0.8},
"size": {"value": 80}
}
}'
) |> as_vegaspec()Note: I cannot get this to render and can’t find a solution online. From what I found out, it may be my package versions or the R version that I have that doesn’t allow this to work. Sometimes it works and sometimes it doesn’t.
'
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"transform": [
{
"calculate": "(datum.temp_max + datum.temp_min) / 2",
"as": "daily_mean"
},
{
"timeUnit": "month",
"field": "date",
"as": "month"
},
{
"aggregate": [
{
"op": "average",
"field": "daily_mean",
"as": "mean_of_means"
}
],
"groupby": ["month"]
}
],
"height": 400,
"width": 500,
"title": "Mean of Mean Temperatures Per Month",
"mark": "circle",
"encoding": {
"x": {
"field": "month",
"type": "temporal",
"timeUnit": "month",
"title": "Month"
},
"y": {
"field": "mean_of_means",
"type": "quantitative",
"title": "Mean of Daily Means (F)"
},
"color": {
"field": "year",
"title": "Year"
},
"fillOpacity": {
"value": 0.8
},
"size": {
"value": 80
}
}
}' |> as_vegaspec()I can’t understand why there is an issue with this either. I thought I was calculating the mean daily temp and then aggregating that by month the same way I aggregated with the temp_max.
Visualizing Weather 2
Exercise 3 Create a graphic that shows how the different types of weather (rain, fog, etc.) are distributed by month in Seattle. When is it rainiest in Seattle? Sunniest?
Graph #1
'{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"height": 400,
"width": 500,
"title": "Seattle Weather Type Distribution (2012-2015)",
"mark": "bar" ,
"encoding": {
"x": {"field": "month", "type": "temporal", "timeUnit": "month","title": "Month"},
"y": {"field": "weather", "type": "quantitative", "aggregate": "count","title": "Count of Days"},
"color": {"field": "weather", "title": "Weather Type"},
"xOffset": {"field": "weather"}
}
}' |> as_vegaspec()This graph shows the differences between each weather type through a stacked bar graph. I can clearly see that in January, the ranking of the counts go from 1. sun, 2. rain, 3. snow, 4. fog, and 5. drizzle. The only issue with this graph is that it doesn’t necessarily help with evaluating whether a month has more or less of a specific weather type. For example, I can’t tell if July’s fog count is the same as October’s fog count. I hope to fix this issue with the next graph.
Graph #2
'{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"height": 200,
"width": 175,
"title": "Seattle Weather Distribution (2012-2015) Faceted By Weather Type",
"mark": "bar" ,
"encoding": {
"x": {"field": "month", "type": "temporal", "timeUnit": "month","title": "Month"},
"y": {"field": "weather", "type": "quantitative", "aggregate": "count","title": "Count of Days"},
"color": {"field": "weather", "title": "Weather Type"},
"xOffset": {"field": "weather"},
"facet": {"field": "weather", "columns": 3, "title": ""}
}
}' |> as_vegaspec()This graph facets by weather type to highlight comparisons between months for each specific weather type. For example, I can see that in a 4 year span from 2012-2015, September had a larger overall number of sunny days than any other month. I can also see that July, September, and November had no snow. The only drawbacks of this graph is that I can’t see the difference between weather types. I can’t immediately tell the difference between drizzle and fog counts for any month.